Introduction

The DIG (Digitalis Investigation Group) Trial was a randomized, double-blind, multicenter trial with more than 300 centers in the United States and Canada participating. The purpose of the trial was to examine the safety and efficacy of Digoxin in treating patients with congestive heart failure in sinus rhythm. Digitalis was introduced clinically more than 200 years ago and has since become a commonly prescribed medication for the treatment of heart failure; however, there was considerable uncertainty surrounding its safety and efficacy. Small trials indicated that Digoxin alleviated some of the symptoms of heart failure, prolonged exercise tolerance, and generally improved the quality of patients’ lives. Unfortunately, these trials were generally small and although they did focus on the effect of treatment on patients’ relief from heart failure symptoms and quality of life, they failed to address the effect of treatment on cardiovascular outcomes. Questions about the safety of Digoxin were also a concern. Digoxin toxicity is uncommon in small trials with careful surveillance, however, the long-term effects of therapeutic levels of Digoxin were less clear.

The DIG dataset consists of baseline and outcome data from the main DIG trial. In the main trial, heart failure patients meeting the eligibility criterion and whose ejection fraction was 45% or less were randomized to receive either a placebo or digoxin. Outcomes assessed in the trial included: cardiovascular mortality, hospitalization or death from worsening heart failure, hospitalization due to other cardiovascular causes and hospitalization due to non-cardiovascular causes.

The DIG dataset was obtained for the purpose of this assignment and is enclosed with this assignment. The codebook associated with the variables is also enclosed with your assignment.

In order to create an anonymous dataset that protects patient confidentiality, most variables have been permuted over the set of patients within treatment group. Therefore, it would be inappropriate to use this dataset for other research or publication purposes.

Instructions

Change my name and id to yours in YAML above and complete the tasks below by inserting the required code in the R chunks provided under each task. Then knit the document to generate a html document with your solutions.

Data Management

Task 1

  • Read in the csv file DIG.csv provided in your assignment and call it dig.df.

  • Select the following variables from the data: ID, TRTMT, AGE, SEX, BMI, KLEVEL, CREAT, DIABP, SYSBP and HYPERTEN, CVD, WHF, DIG, HOSP, HOSPDAYS, DEATH, DEATHDAY.

  • And convert each column to a datatype that is most relevant. i.e. characters to character, numbers to numeric, etc.

if (!require("janitor")) install.packages("janitor")
## Loading required package: janitor
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
if (!require("tidyverse")) install.packages("tidyverse")
## Loading required package: tidyverse
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
if (!require("lubridate")) install.packages("lubridate")
if (!require("plotly")) install.packages("plotly")
## Loading required package: plotly
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
if (!require("gganimate")) install.packages("gganimate")
## Loading required package: gganimate
if (!require("gifski")) install.packages("gifski")
## Loading required package: gifski
if (!require("ggvis")) install.packages("ggvis")
## Loading required package: ggvis
## 
## Attaching package: 'ggvis'
## 
## The following object is masked from 'package:gganimate':
## 
##     view_static
## 
## The following objects are masked from 'package:plotly':
## 
##     add_data, hide_legend
## 
## The following object is masked from 'package:ggplot2':
## 
##     resolution
if (!require("gghighlight")) install.packages("gghighlight")
## Loading required package: gghighlight
library(tidyverse)
library(janitor)
library(lubridate)
library(table1)
## 
## Attaching package: 'table1'
## 
## The following objects are masked from 'package:base':
## 
##     units, units<-
library(plotly)
#library(gganimate)
library(gifski)
library(ggvis)
library(gghighlight)
library(survminer)
## Loading required package: ggpubr
library(survival)
## 
## Attaching package: 'survival'
## 
## The following object is masked from 'package:survminer':
## 
##     myeloma
#importing data
#read.csv("C:/Users/ak/Desktop/gds/r_assignment/DIG.csv")
## Insert your code here
dig.df <- read.csv("C:/Users/ak/Desktop/gds/r_assignment/DIG.csv") %>%
  janitor::clean_names() 
#dig.df

Task 2

  • Using the codebook provided, label the factor variables in the data appropriately.
## Insert your code here
#cleaning the data and selecting the desired information to work on
dig_new.df <- dig.df %>%
 
   mutate(
     trtmt = factor(trtmt, levels = c(0,1), labels = c("Placebo", "Treatment")), 
     sex = factor(sex, levels = c(1,2), labels = c("Males", "Females")),
     #hyperten = factor(hyperten, levels = c(0,1)), 
     hyperten = factor(hyperten, levels = c(0,1), labels = c("No","Yes")),
     cvd = factor(cvd, levels = c(0,1), labels = c("No","Yes")),
     whf = factor(whf, levels = c(0,1), labels = c("No","Yes")), 
     dig = factor(dig, levels = c(0,1), labels = c("No","Yes")), 
     hosp = factor(hosp, levels = c(0,1), labels = c("No","Yes")), 
     death = factor(death, levels = c(0,1), labels = c("Alive","Death"))
     ) %>%
  
  select(id, trtmt, age, sex, bmi, klevel, creat, diabp, sysbp, hyperten, cvd, whf, dig, hosp, hospdays, death, deathday)

# show tidy new data frame
#dig_new.df

Data Exploration

Use both appropriate summary statistics and visualisation to answer the questions below.

Question 1:

Summarise the number (and proportion) of patients hired within each treatment group.

## Insert your code here
# calculating the proportion  by grouping and summarising
dig_new.df %>%
  group_by(trtmt,sex) %>%
  summarise( count = n(), proportion = (n() / nrow(dig.df))*100, .groups = "drop")
## # A tibble: 4 × 4
##   trtmt     sex     count proportion
##   <fct>     <fct>   <int>      <dbl>
## 1 Placebo   Males    2639       38.8
## 2 Placebo   Females   764       11.2
## 3 Treatment Males    2642       38.9
## 4 Treatment Females   755       11.1
# summarising data using table1
label(dig_new.df$trtmt) <- "Treatment"

table1(~trtmt| sex , data = dig_new.df, caption = "Number of Patients hired within each Treatment Group")
Number of Patients hired within each Treatment Group
Males
(N=5281)
Females
(N=1519)
Overall
(N=6800)
Treatment
Placebo 2639 (50.0%) 764 (50.3%) 3403 (50.0%)
Treatment 2642 (50.0%) 755 (49.7%) 3397 (50.0%)

Interpretation ….. In the table, we see a uniform distribution with 50.0% males in the placebo group and 50.0% in treatment group. Similarly, in females 50.3% were in the placebo group and 49.7% were in the treatment group. Overall, there is no significant difference in treatment allocation based on sex. We have more males than females in the given sample size, however treatment within each sex is evenly distributed.

Question 2:

Assess if there is any significant differences in base-line characteristics (e.g. Age, Sex, BMI, …) between the patients assigned to digoxin and patients assigned to placebo and comment on any unusual pattern you see:

## Insert your code here
# tabel labels
#dig_new.df 
label(dig_new.df$sex) <- "Sex"
label(dig_new.df$age) <- "Age"
label(dig_new.df$bmi) <- "BMI"
label(dig_new.df$klevel)<-"KLEVEL"
label(dig_new.df$creat)<- "CREAT"
label(dig_new.df$diabp)<-"DIABP"
label(dig_new.df$sysbp) <- "SYSBP"
label(dig_new.df$hyperten) <- "HYPERTEN"
label(dig_new.df$cvd) <- "CVD"
label(dig_new.df$whf) <- "WHF"
label(dig_new.df$dig) <- "DIG"
label(dig_new.df$hosp) <- "HOSP"
label(dig_new.df$death) <- "DEATH"

# creating a table with baseline character againt treatment using table1 
table1(~ age + sex + bmi + klevel + creat + diabp + sysbp + hyperten + cvd + whf + dig + hosp + death | trtmt, data = dig_new.df, caption = "Summary of Base-Line Charecterstics for Digoxin and Placebo")
Summary of Base-Line Charecterstics for Digoxin and Placebo
Placebo
(N=3403)
Treatment
(N=3397)
Overall
(N=6800)
Age
Mean (SD) 63.5 (10.8) 63.4 (11.0) 63.5 (10.9)
Median [Min, Max] 65.0 [22.0, 90.0] 64.0 [21.0, 90.0] 65.0 [21.0, 90.0]
Sex
Males 2639 (77.5%) 2642 (77.8%) 5281 (77.7%)
Females 764 (22.5%) 755 (22.2%) 1519 (22.3%)
BMI
Mean (SD) 27.2 (5.19) 27.0 (5.19) 27.1 (5.19)
Median [Min, Max] 26.6 [14.4, 62.7] 26.4 [15.2, 58.3] 26.5 [14.4, 62.7]
Missing 1 (0.0%) 0 (0%) 1 (0.0%)
KLEVEL
Mean (SD) 4.46 (7.87) 4.33 (0.511) 4.40 (5.57)
Median [Min, Max] 4.30 [0, 434] 4.30 [0, 6.30] 4.30 [0, 434]
Missing 410 (12.0%) 391 (11.5%) 801 (11.8%)
CREAT
Mean (SD) 1.29 (0.372) 1.28 (0.366) 1.29 (0.369)
Median [Min, Max] 1.21 [0.100, 3.05] 1.20 [0.500, 3.76] 1.20 [0.100, 3.76]
DIABP
Mean (SD) 74.9 (11.1) 74.9 (11.5) 74.9 (11.3)
Median [Min, Max] 75.0 [38.0, 140] 75.0 [25.0, 184] 75.0 [25.0, 184]
Missing 3 (0.1%) 2 (0.1%) 5 (0.1%)
SYSBP
Mean (SD) 126 (19.9) 126 (19.9) 126 (19.9)
Median [Min, Max] 124 [74.0, 202] 122 [78.0, 220] 123 [74.0, 220]
Missing 2 (0.1%) 1 (0.0%) 3 (0.0%)
HYPERTEN
No 1846 (54.2%) 1869 (55.0%) 3715 (54.6%)
Yes 1557 (45.8%) 1527 (45.0%) 3084 (45.4%)
Missing 0 (0%) 1 (0.0%) 1 (0.0%)
CVD
No 1553 (45.6%) 1703 (50.1%) 3256 (47.9%)
Yes 1850 (54.4%) 1694 (49.9%) 3544 (52.1%)
WHF
No 2223 (65.3%) 2487 (73.2%) 4710 (69.3%)
Yes 1180 (34.7%) 910 (26.8%) 2090 (30.7%)
DIG
No 3372 (99.1%) 3330 (98.0%) 6702 (98.6%)
Yes 31 (0.9%) 67 (2.0%) 98 (1.4%)
HOSP
No 1121 (32.9%) 1213 (35.7%) 2334 (34.3%)
Yes 2282 (67.1%) 2184 (64.3%) 4466 (65.7%)
DEATH
Alive 2209 (64.9%) 2216 (65.2%) 4425 (65.1%)
Death 1194 (35.1%) 1181 (34.8%) 2375 (34.9%)
# importing the data and selecting the desired characteristics
# dig_plot.df <- read.csv("C:/Users/ak/Desktop/gds/r_assignment/DIG.csv") 
# dig_plot.df %>%
#   select(ID, TRTMT, AGE, SEX, BMI, KLEVEL, CREAT, DIABP, SYSBP)
# removing missing values
#dig_plot.df <- na.omit(dig_plot.df)
#dig_plot.df
#parallel coordinate plot
# graph <- dig_plot.df %>% 
#   plot_ly(type = 'parcoords',
#           line = list(color = dig_plot.df$TRTMT,
#                       colorscale = list(c(0, 'purple'), c(1, 'orange')),
#                       showscale = T),
#   dimensions = list(
#     list(tickvals = c(0, 1), ticktext = c('Placebo', 'Treatment'), label = "TRTMT", values = dig_plot.df$TRTMT),
#     list(range = c(min(dig_plot.df$AGE), max(dig_plot.df$AGE)), label = "AGE", values = dig_plot.df$AGE),
#     list(range = c(min(dig_plot.df$BMI), max(dig_plot.df$BMI)), label = "BMI", values = dig_plot.df$BMI),   
#     list(range = c(min(dig_plot.df$Klevel), max(dig_plot.df$KLEVEL)),label = "KLEVEL", values = dig_plot.df$KLEVEL),
#     list(range = c(min(dig_plot.df$CREAT), max(dig_plot.df$CREAT)), label = "CREAT", values = dig_plot.df$CREAT),
#     list(range = c(min(dig_plot.df$DIABP), max(dig_plot.df$DIABP)), label = "DIABP", values = dig_plot.df$DIABP),
#     list(range = c(min(dig_plot.df$SYSBP), max(dig_plot.df$SYSBP)), label = "SYSBP", values = dig_plot.df$SYSBP),
#     list(tickvals = c(1, 2), ticktext = c('Male', 'Female'), label = "SEX", values = dig_plot.df$SEX)
#   ))%>%
#      layout(margin = list(t = 100), ##bottom margin in pixels
#          annotations = 
#            list(x = .5, y = 1.22, #position of text adjust as needed 
#                 text = "Baseline Characteristics for Treatment Group", 
#                showarrow = F,
#                 font=list(size=15, color= "black")))
# 
# 
# # Show the plot
# graph

Interpretation ….. The Digoxin and placebo groups are almost balanced for most baseline characteristics like age, sex, BMI, bloop pressure and creatinine level. In case of Cardiovascular disease, worsening heart failure and prior hospitalization are slightly more prevalent in placebo group than treatment. Majority if the people are male with moderate BMI. Overall, both groups are comparable, such that any variation that may arise will due to treatment effect rather than baseline issues.

Question 3:

Assess if the overall mortality was affected by the treatment.

## Insert your code here
# grouping, summarizing and mutating the data 
a <- dig_new.df%>%
  group_by(trtmt,death)%>%
  summarise(count = n(), .groups = "drop")%>%
  group_by(trtmt)%>%  # to ensure correct denominator
  mutate(perctage = count / sum(count)*100 )

a
## # A tibble: 4 × 4
## # Groups:   trtmt [2]
##   trtmt     death count perctage
##   <fct>     <fct> <int>    <dbl>
## 1 Placebo   Alive  2209     64.9
## 2 Placebo   Death  1194     35.1
## 3 Treatment Alive  2216     65.2
## 4 Treatment Death  1181     34.8
#table1(~death|trtmt, data = dig_new.df, caption = 'Overall Effect of Treatment on Mortality')
# ggplot
g <-ggplot(data = a, 
       mapping = aes(x = death, y = perctage, fill = death)) +
  geom_bar(stat = "identity", alpha = 0.6) +
  labs(
    title = "Barplot of Overall Effect of Treatment on Mortality",
     fill = "Mortality",
     caption = "Source: DIG-Digitalis Investigation Group",
     x = "Death",
     y = "Percentage %") +
  scale_fill_manual(values = c("Alive" = "cyan" , "Death" = "lightgreen" ))+
         theme_classic()
# show the plot
ggplotly(g)
AliveDeath050100
MortalityAliveDeathBarplot of Overall Effect of Treatment on MortalityDeathPercentage %

Interpretation ….. Patients in both Placebo or Treatment group show almost identical mortality and survival rates.

Question 4:

Assess if the Cardiovascular disease (CVD) is associated with the mortality overall and also within each treatment group.

## Insert your code here
# table showing overall cvd association with mortality and treatment groups using table1
table1(~cvd|trtmt + death, data = dig_new.df)
Placebo
Treatment
Overall
Alive
(N=2209)
Death
(N=1194)
Alive
(N=2216)
Death
(N=1181)
Alive
(N=4425)
Death
(N=2375)
CVD
No 1150 (52.1%) 403 (33.8%) 1246 (56.2%) 457 (38.7%) 2396 (54.1%) 860 (36.2%)
Yes 1059 (47.9%) 791 (66.2%) 970 (43.8%) 724 (61.3%) 2029 (45.9%) 1515 (63.8%)
#ggplot
g0 <- ggplot(data = dig_new.df, 
       mapping = aes(x = cvd, fill = death)) +
         facet_wrap(~trtmt) + 
  geom_bar(position="fill" ) +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Barplot of Overall Effect of Treatment on Mortality and CVD",
       x ="CVD Status", y = "Percentage",
       fill = "Mortality") +
       scale_fill_manual(values = c("Alive" = "skyblue" , "Death" = "orange" ))+
  theme_classic()

#show plot
ggplotly(g0)
NoYes0%25%50%75%100%NoYes
MortalityAliveDeathBarplot of Overall Effect of Treatment on Mortality and CVDCVD StatusPercentagePlaceboTreatment

Interpretation ….. Survival rate is higher in people without cvd(54.) in comparision to those with cvd (45.9%). Participats with cvd (63.8%) show a higher rate of mortatity than those without cvd (36,2%).

Question 5:

Assess if the hospitalizations was affected by the treatment.

## Insert your code here
# table for relation between hospitalization and treatment
label(dig_new.df$hosp) <- "Hospitalization"
table1(~hosp|trtmt,dig_new.df,caption = "Hopitalizations affected by treatment")
Hopitalizations affected by treatment
Placebo
(N=3403)
Treatment
(N=3397)
Overall
(N=6800)
Hospitalization
No 1121 (32.9%) 1213 (35.7%) 2334 (34.3%)
Yes 2282 (67.1%) 2184 (64.3%) 4466 (65.7%)
# ggplot
g1<- ggplot(data = dig_new.df, 
       mapping = aes(x = trtmt, fill = hosp)) +
  geom_bar(position = "fill", alpha = 0.8) +
  scale_y_continuous(labels = scales::percent) +
  labs(
    title = "Overall Effect of Treatment on Hospitalizations",
     caption = "Source: DIG-Digitalis Investigation Group",
     x ="Treatment",
     y = "Percentage",
    fill = "Hospitalized")+
  scale_fill_manual(values = c("Yes" = "maroon" , "No" = "lightgreen" )) +
  theme_classic()
#show plot
ggplotly(g1)
PlaceboTreatment0%25%50%75%100%
HospitalizedNoYesOverall Effect of Treatment on HospitalizationsTreatmentPercentage

Interpretation ….. Most people (Placebo: 67.1% , Treatment: 64.3%) have been hospitalized. Only a small percentage of those receiving treatment were hospitalized less in compared to placebo.

Question 6

Assess if the Worsening heart failure (WHF) is associated with the hospitalizations overall and also within each treatment group:

## Insert your code here
# grouping, summaring and mutating data frame
label(dig_new.df$whf) <- "Worsening heart failure"
m <-dig_new.df%>%
  group_by(whf,hosp,trtmt)%>%
  summarise(count = n(), .groups = "drop")%>%
  group_by(trtmt,hosp)%>%
  mutate(perctage = count / sum(count)*100 )
m
## # A tibble: 6 × 5
## # Groups:   trtmt, hosp [4]
##   whf   hosp  trtmt     count perctage
##   <fct> <fct> <fct>     <int>    <dbl>
## 1 No    No    Placebo    1121    100  
## 2 No    No    Treatment  1213    100  
## 3 No    Yes   Placebo    1102     48.3
## 4 No    Yes   Treatment  1274     58.3
## 5 Yes   Yes   Placebo    1180     51.7
## 6 Yes   Yes   Treatment   910     41.7
# summarizing using table1
table1(~whf|trtmt +hosp, dig_new.df, caption = "Effect of Worsening Heart Failure on Hospitalization in Patients")
Effect of Worsening Heart Failure on Hospitalization in Patients
Placebo
Treatment
Overall
No
(N=1121)
Yes
(N=2282)
No
(N=1213)
Yes
(N=2184)
No
(N=2334)
Yes
(N=4466)
Worsening heart failure
No 1121 (100%) 1102 (48.3%) 1213 (100%) 1274 (58.3%) 2334 (100%) 2376 (53.2%)
Yes 0 (0%) 1180 (51.7%) 0 (0%) 910 (41.7%) 0 (0%) 2090 (46.8%)
#ggplot
g2 <- ggplot(data = dig_new.df, 
       mapping = aes(x = whf, fill = hosp)) +
         facet_wrap(~trtmt) + 
  geom_bar(position="fill" ) +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "Barplot of Overall Effect of WHF on Hospitalization and Treatment",
       x ="WHF Status", y = "Percentage",
       fill = "Hospitalised") +
       scale_fill_manual(values = c("Yes" = "orchid" , "No" = "coral" ))+
         theme_classic()
#show plot
ggplotly(g2)
NoYes0%25%50%75%100%NoYes
HospitalisedNoYesBarplot of Overall Effect of WHF on Hospitalization and TreatmentWHF StatusPercentagePlaceboTreatment

Interpretation ….. Participants who received treatment were hospitalized more frequently (58.3%), than others (48.2%). Those with worsening heart failure getting digoxin(41.6%) were less hospitalized.

Question 7:

Create a new variable Month by dividing the variable DEATHDAY to 30 and round it to the nearest whole number.

## Insert your code here
# adding a new column to existing data frame using mutate
dig_new1.df <- dig_new.df %>%
  mutate(Month = round(dig.df$deathday/30))
#show data frame
#dig_new1.df

Question 8:

Summarise the variable Month created above.

## Insert your code here
# calculating minimum, maximum, mean, median, standard deviation using summaries function
h <- dig_new1.df %>%
  summarise(
    Minimum = min(Month),
    Maximum = max(Month),
    Mean = mean(Month),
    Median = median(Month),
    Std_Deviation = sd(Month)
  )
#show tabel
h
##   Minimum Maximum     Mean Median Std_Deviation
## 1       0      59 35.44868     38      15.17628
# calculating minimum, maximum, mean, median, standard deviation using using table1
table1(~Month, data = dig_new1.df, caption = "Months upto Last Follow Up or Death")
Months upto Last Follow Up or Death
Overall
(N=6800)
Month
Mean (SD) 35.4 (15.2)
Median [Min, Max] 38.0 [0, 59.0]

Interpretation ….. On average patients followed up till approximatly 3 years (35.4 months),with some just under 5 years(Max:59 months).

Question 9:

Summarise the risk of mortality within each month.

HINT: you may want to use survfit function in Survival package to extract required mortality rate within each month. For an example see here

## Insert your code here
# loadin the following packages 
library(knitr)
library(survival)
library(ggsurvfit)

# importing the data

y <- read.csv("C:/Users/ak/Desktop/gds/r_assignment/DIG.csv")

#selecting the desired data and adding Month column by using mutate
y<-y%>%
  select(ID, TRTMT, AGE, SEX, BMI, KLEVEL, CREAT, DIABP, SYSBP, HYPERTEN, CVD, WHF, DIG, HOSP, HOSPDAYS, DEATH, DEATHDAY)%>%
  janitor::clean_names()%>%
  mutate(month = round(deathday/30))

#using survfit function to estimated survival probabilities

f <- Surv(time = y$month, event = y$death) 
f1 <- survfit(f ~1, data = y)

# graph for survfit
ggsurvfit(f1, linewidth = 1) +
  labs(x = "Months", y = "Cumulative Incidence")+
  add_risktable()+
  scale_ggsurvfit()

Interpretation ….. The cumulative incidence increases showing a positive relationship and survival decreases over time showing a negative relationship. #### Question 10:

Summarise the risk of mortality within each month and for each treatment group.

## Insert your code here
y <- read.csv("C:/Users/ak/Desktop/gds/r_assignment/DIG.csv")
light<-y%>%
  select(ID, TRTMT, AGE, SEX, BMI, KLEVEL, CREAT, DIABP, SYSBP, HYPERTEN, CVD, WHF, DIG, HOSP, HOSPDAYS, DEATH, DEATHDAY)%>%
  janitor::clean_names()%>%
  mutate(Month = round(deathday/30))

fa <- survfit(Surv(time = light$Month, event = light$death) ~trtmt, data = light) 
fa
## Call: survfit(formula = Surv(time = light$Month, event = light$death) ~ 
##     trtmt, data = light)
## 
##            n events median 0.95LCL 0.95UCL
## trtmt=0 3403   1194     NA      NA      NA
## trtmt=1 3397   1181     NA      NA      NA
ggsurvplot(fa, data =light ,
          linewidth = 1,
          palette = c("black","orange"),
          censor.shape = '|', censor.size = 4,
          conf.int = T,
          pval = T,
          risk.table = T,
          risk.table.col = 'strata',
          legend.labs = list ('0' = "Placebo", '1' = "Treatment" ),
          risk.table.height = 0.25,
          title = "Mortality Risk per Month and Treatment Group")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the ggpubr package.
##   Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Interpretation ….. Treatment does not show any significant improvement compared to placebo for mortality risk over time. #### Question 11: Assess the effect of CVD on the risk of mortality within each month and for each treatment group:

#
dig_new1.df%>%
  filter(death == "Death") %>%
ggplot() +
    geom_point(mapping = aes(x = cvd, y = death, colour = trtmt),
               position = "jitter", alpha = 1)

labs(x = "Effect of CVD",
     y = "Mortality")+
  theme_minimal()
## NULL

Interpretation ….. Those with cardiovascular disease have moslty higher rate of both placebo and treatment participants than those without cvd.Both the groups with and without cvd show similar mortality rate. #### Question 12:

Assess if there is any linear relationship between systolic and diastolic blood pressures? Is this relationship affected by the treatment the patients received or whether patients have hypertension or not?

## Insert your code here
ggplot(data = dig_new1.df,
       mapping = aes(x = diabp, y = sysbp, colour = trtmt)) +
  geom_point() +
  geom_smooth(method = "lm", se = F)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 8 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 8 rows containing missing values or values outside the scale range
## (`geom_point()`).

Interpretation ….. The distribution of placebo and digoxin group is even distributed among those presenting with systolic blood pressure and diastolic blood pressure